• Steven Ponce
  • About
  • Data Visualizations
  • Projects
  • Resume
  • Email

On this page

  • Steps to Create this Graphic
    • 1. Load Packages & Setup
    • 2. Read in the Data
    • 3. Examine the Data
    • 4. Tidy Data
    • 5. Visualization Parameters
    • 6. Plot
    • 7. Save
    • 8. Session Info
    • 9. GitHub Repository
    • 10. References
    • 11. Custom Functions Documentation

The Sherlock Holmes Canon Thematic Word Networks

  • Show All Code
  • Hide All Code

  • View Source

Each of the 15 most-discussed stories is linked to its five most distinctive dialogue words. Cluster size reflects vocabulary uniqueness. Purple cluster highlights Sign of the Four’s distinctive vocabulary.

TidyTuesday
Data Visualization
R Programming
2025
Network analysis of Sherlock Holmes stories using TF-IDF to identify distinctive dialogue vocabulary. Built with R, tidytext, and ggraph to visualize thematic word relationships across 15 stories from the Holmes canon.
Published

November 16, 2025

Figure 1: Network graph showing 15 Sherlock Holmes stories (teal nodes) connected to their five most distinctive dialogue words (small grey nodes). The Sign of the Four stands out with a large purple cluster of connected words, indicating it has the most unique vocabulary among the stories. Other stories like Hound of the Baskervilles, Study in Scarlet, and Valley of Fear show smaller, more dispersed word clusters.

Steps to Create this Graphic

1. Load Packages & Setup

Show code
```{r}
#| label: load
#| warning: false
#| message: false
#| results: "hide"

## 1. LOAD PACKAGES & SETUP ----
suppressPackageStartupMessages({
if (!require("pacman")) install.packages("pacman")
pacman::p_load(
    tidyverse,     # Easily Install and Load the 'Tidyverse'
    ggtext,        # Improved Text Rendering Support for 'ggplot2'
    showtext,      # Using Fonts More Easily in R Graphs
    janitor,       # Simple Tools for Examining and Cleaning Dirty Data
    scales,        # Scale Functions for Visualization
    tidygraph,     # A Tidy API for Graph Manipulation
    igraph,        # Network Analysis and Visualization
    ggraph,        # An Implementation of Grammar of Graphics for Graphs and Networks
    ggrepel,       # Automatically Position Non-Overlapping Text Labels with 'ggplot2'
    tidytext       # Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools
)
})

### |- figure size ----
camcorder::gg_record(
  dir    = here::here("temp_plots"),
  device = "png",
  width  = 10,
  height = 8,
  units  = "in",
  dpi    = 320
)

# Source utility functions
suppressMessages(source(here::here("R/utils/fonts.R")))
source(here::here("R/utils/social_icons.R"))
source(here::here("R/utils/image_utils.R"))
source(here::here("R/themes/base_theme.R"))
```

2. Read in the Data

Show code
```{r}
#| label: read
#| include: true
#| eval: true
#| warning: false

tt <- tidytuesdayR::tt_load(2025, week = 46)

holmes <- tt$holmes |> clean_names()

tidytuesdayR::readme(tt)
rm(tt)
```

3. Examine the Data

Show code
```{r}
#| label: examine
#| include: true
#| eval: true
#| results: 'hide'
#| warning: false

glimpse(holmes)
skimr::skim(holmes) |> summary()
```

4. Tidy Data

Show code
```{r}
#| label: tidy-fixed
#| warning: false

holmes_clean <- holmes |>
  filter(!is.na(text), nchar(text) > 5) |>
  filter(
    !str_detect(text, "^[A-Z ]+$"),
    !str_detect(text, "^(Table of|CHAPTER|Part [IVX]+|\\d+$)")
  ) |>
  mutate(
    has_quotes = str_detect(text, '"'),
    speaker = case_when(
      str_detect(text, "(?i)said Holmes|Holmes said|Holmes replied|Holmes answered|Holmes asked") ~ "Holmes",
      str_detect(text, "(?i)said I|I said|I replied|I answered|I asked") ~ "Watson",
      str_detect(text, "(?i)he said|she said|he replied|she replied") ~ "Other",
      has_quotes ~ "Unknown",
      TRUE ~ "Narrative"
    ),
    dialogue_text = str_extract_all(text, '"([^"]*)"') |>
      map_chr(~ {
        if (length(.x) > 0) {
          str_remove_all(.x, '"') |>
            str_trim() |>
            str_c(collapse = " ")
        } else {
          NA_character_
        }
      })
  )

# Word counts & TF–IDF
book_words <- holmes_clean |>
  filter(speaker %in% c("Holmes", "Watson")) |>
  filter(!is.na(dialogue_text)) |>
  select(book, dialogue_text) |>
  unnest_tokens(word, dialogue_text) |>
  anti_join(stop_words, by = "word") |>
  filter(nchar(word) >= 4) |>
  count(book, word, sort = TRUE)

book_tfidf <- book_words |>
  bind_tf_idf(word, book, n) |>
  arrange(desc(tf_idf))

top_books <- book_words |>
  count(book, sort = TRUE) |>
  slice_head(n = 15) |>
  pull(book)

distinctive_words <- book_tfidf |>
  filter(book %in% top_books) |>
  group_by(book) |>
  slice_max(tf_idf, n = 5) |>
  ungroup()

edges <- distinctive_words |>
  select(from = word, to = book, weight = tf_idf)

graph <- graph_from_data_frame(edges, directed = FALSE)
V(graph)$degree <- degree(graph)

tbl_graph <- as_tbl_graph(graph)

# Short book labels and node attributes
book_labels <- tibble(book = unique(distinctive_words$book)) |>
  mutate(
    book_label = book |>
      str_remove("^The Adventure of the ") |>
      str_remove("^The Adventure of ") |>
      str_remove("^The ") |>
      str_remove("^A ")
  )

tbl_graph <- tbl_graph |>
  activate(nodes) |>
  left_join(book_labels, by = c("name" = "book")) |>
  mutate(
    node_type = if_else(name %in% top_books, "Book", "Word"),
    size_metric = if_else(node_type == "Book", degree, 1),
    node_type_fct = factor(node_type, levels = c("Book", "Word")),
    display_label = if_else(node_type == "Book", book_label, NA_character_),
    # Highlight Sign of the Four cluster
    is_sign_cluster = name == "The Sign of the Four" |
      str_detect(name, "Sign of the Four")
  )

# Mark edges connected to "Sign of the Four"
tbl_graph <- tbl_graph |>
  activate(edges) |>
  mutate(
    from_node = .N()$name[from],
    to_node = .N()$name[to],
    is_sign_edge = from_node == "The Sign of the Four" |
      to_node == "The Sign of the Four"
  )

# Create layout
set.seed(221)
layout_tbl <- ggraph::create_layout(tbl_graph, layout = "fr")

# Position for "Sign of the Four"
sign_pos <- layout_tbl |>
  as_tibble() |>
  filter(node_type == "Book", str_detect(name, "Sign of the Four")) |>
  slice(1)

annotation_df <- tibble(
  x = sign_pos$x + 0.5,
  y = sign_pos$y + 3,
  xend = sign_pos$x,
  yend = sign_pos$y,
  label = "'Sign of the Four' features<br>the most distinctive vocabulary."
)
```

5. Visualization Parameters

Show code
```{r}
#| label: params
#| include: true
#| warning: false

### |-  plot aesthetics ----
colors <- get_theme_colors(
    palette = list(
        book_col = "#0f9a8a",      
        word_col = "#b5beca",      
        highlight_col = "#9b59b6"  
    )
)

### |- titles and caption ----
title_text <- str_glue("The Sherlock Holmes Canon<br>Thematic Word Networks")

subtitle_text <- str_glue(
    "Each of the 15 most-discussed stories is linked to its five most distinctive dialogue words.<br>",
    "Cluster size reflects vocabulary uniqueness. Purple cluster highlights Sign of the Four's distinctive vocabulary."
)

caption_text <- create_social_caption(
    tt_year = 2025,
    tt_week = 46,
    source_text = "{ sherlock R package }"
    )

### |-  fonts ----
setup_fonts()
fonts <- get_font_families()

### |-  plot theme ----
# Start with base theme
base_theme <- create_base_theme(colors)

# Add weekly-specific theme elements
weekly_theme <- extend_weekly_theme(
  base_theme,
  theme(
    # Text styling
    plot.title = element_markdown(
      face = "bold", family = fonts$title, size = rel(1.4),
      color = colors$title, margin = margin(b = 10), hjust = 0
    ),
    plot.subtitle = element_text(
      face = "italic", family = fonts$subtitle, lineheight = 1.2,
      color = colors$subtitle, size = rel(0.9), margin = margin(b = 20), hjust = 0
    ),

    ## Grid
    # panel.grid.major.y = element_blank(),
    # panel.grid.minor = element_blank(),
    # panel.grid.major.x = element_line(color = "gray90", linewidth = 0.3),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.grid = element_blank(),

    # Axes
    axis.title = element_text(size = rel(0.8), color = "gray30"),
    axis.text = element_text(color = "gray30"),
    axis.text.y = element_text(size = rel(0.85)),
    axis.ticks = element_blank(),

    # Facets
    strip.background = element_rect(fill = "gray95", color = NA),
    strip.text = element_text(
      face = "bold",
      color = "gray20",
      size = rel(1),
      margin = margin(t = 8, b = 8)
    ),
    panel.spacing = unit(2, "lines"),

    # Legend elements
    legend.position = "plot",
    legend.title = element_text(
      family = fonts$tsubtitle,
      color = colors$text, size = rel(0.8), face = "bold"
    ),
    legend.text = element_text(
      family = fonts$tsubtitle,
      color = colors$text, size = rel(0.7)
    ),
    legend.margin = margin(t = 15),

    # Plot margin
    plot.margin = margin(20, 20, 20, 20)
  )
)

# Set theme
theme_set(weekly_theme)
```

6. Plot

Show code
```{r}
#| label: plot
#| warning: false

### |-  main plot ----
p <- 
  ggraph(layout_tbl) +

  # Geoms
  geom_edge_link(
    aes(
      color = is_sign_edge,
      alpha = is_sign_edge,
      # width = is_sign_edge
    )
  ) +
  scale_edge_color_manual(
    values = c("TRUE" = colors$palette$highlight_col, "FALSE" = "grey70"),
    guide = "none"
  ) +
  scale_edge_alpha_manual(
    values = c("TRUE" = 0.28, "FALSE" = 0.25),
    guide = "none"
  ) +
  scale_edge_width_manual(
    values = c("TRUE" = 0.28, "FALSE" = 0.30),
    guide = "none"
  ) +
  geom_node_point(
    data = ~ filter(., node_type == "Book"),
    aes(size = size_metric * 1.35),
    color = if_else(
      layout_tbl |> filter(node_type == "Book") |> pull(is_sign_cluster),
      colors$palette$highlight_col,
      colors$palette$book_col
    ),
    alpha = 0.25,
    show.legend = FALSE
  ) +
  geom_node_point(
    aes(size = size_metric),
    shape = 21,
    fill = case_when(
      layout_tbl$is_sign_cluster & layout_tbl$node_type == "Book" ~ colors$palette$highlight_col, # Purple Sign book
      layout_tbl$node_type == "Book" ~ colors$palette$book_col,
      layout_tbl$is_sign_cluster & layout_tbl$node_type == "Word" ~ colors$palette$highlight_col, # Purple Sign words
      TRUE ~ colors$palette$word_col
    ),
    color = if_else(
      layout_tbl$is_sign_cluster,
      colors$palette$highlight_col,
      colors$background
    ),
    stroke = if_else(layout_tbl$is_sign_cluster, 0.8, 0.30),
    alpha = 0.70,
    show.legend = FALSE
  ) +
  geom_node_text(
    data = ~ filter(., node_type == "Book"),
    aes(label = display_label),
    repel = TRUE,
    size = 3.0,
    fontface = "bold",
    family = "text",
    color = "grey10",
    box.padding = unit(0.3, "lines"),
    point.padding = unit(0.3, "lines"),
    segment.size = 0.25,
    segment.color = "grey65"
  ) +
  geom_richtext(
    data = annotation_df,
    aes(x = x, y = y, label = label),
    family = "text",
    size = 3.0,
    color = "grey20",
    fill = alpha(colors$background, 0.5),
    label.colour = NA,
    lineheight = 1.05,
    label.padding = unit(0.1, "lines")
  ) +
  # Scales
  scale_size_continuous(range = c(2.4, 10), guide = "none") +
  # Labs
  labs(
    title = title_text,
    subtitle = subtitle_text,
    caption = caption_text
  ) +
  # Theme
  theme(
    plot.title = element_markdown(
      size = rel(1.85),
      family = fonts$title,
      face = "bold",
      color = colors$title,
      lineheight = 1.15,
      margin = margin(t = 8, b = 5)
    ),
    plot.subtitle = element_markdown(
      size = rel(0.85),
      family = fonts$subtitle,
      color = alpha(colors$subtitle, 0.88),
      lineheight = 1.2,
      margin = margin(t = 2, b = 15)
    ),
    plot.caption = element_markdown(
      size = rel(0.55),
      family = "Arial",
      color = colors$caption,
      hjust = 0,
      lineheight = 1.3,
      margin = margin(t = 12, b = 5)
    ),
    axis.text.y = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.grid = element_blank()
  )
```

7. Save

Show code
```{r}
#| label: save
#| warning: false

### |-  plot image ----  
save_plot(
  plot = p, 
  type = "tidytuesday", 
  year = 2025, 
  week = 46, 
  width  = 10,
  height = 8,
  )
```

8. Session Info

Expand for Session Info
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] here_1.0.1      tidytext_0.4.2  ggrepel_0.9.6   ggraph_2.2.1   
 [5] igraph_2.1.1    tidygraph_1.3.1 scales_1.3.0    janitor_2.2.0  
 [9] showtext_0.9-7  showtextdb_3.0  sysfonts_0.8.9  ggtext_0.1.2   
[13] lubridate_1.9.3 forcats_1.0.0   stringr_1.5.1   dplyr_1.1.4    
[17] purrr_1.0.2     readr_2.1.5     tidyr_1.3.1     tibble_3.2.1   
[21] ggplot2_3.5.1   tidyverse_2.0.0 pacman_0.5.1   

loaded via a namespace (and not attached):
 [1] gridExtra_2.3      httr2_1.0.6        rlang_1.1.6        magrittr_2.0.3    
 [5] snakecase_0.11.1   compiler_4.4.0     systemfonts_1.1.0  vctrs_0.6.5       
 [9] pkgconfig_2.0.3    crayon_1.5.3       fastmap_1.2.0      magick_2.8.5      
[13] labeling_0.4.3     utf8_1.2.4         rmarkdown_2.29     markdown_1.13     
[17] tzdb_0.5.0         ragg_1.3.3         camcorder_0.1.0    bit_4.5.0         
[21] xfun_0.49          cachem_1.1.0       jsonlite_1.8.9     SnowballC_0.7.1   
[25] tweenr_2.0.3       parallel_4.4.0     R6_2.5.1           stringi_1.8.4     
[29] Rcpp_1.0.13-1      knitr_1.49         base64enc_0.1-3    gitcreds_0.1.2    
[33] Matrix_1.7-0       timechange_0.3.0   tidyselect_1.2.1   rstudioapi_0.17.1 
[37] yaml_2.3.10        viridis_0.6.5      codetools_0.2-20   curl_6.0.0        
[41] lattice_0.22-6     withr_3.0.2        evaluate_1.0.1     polyclip_1.10-7   
[45] xml2_1.3.6         pillar_1.9.0       janeaustenr_1.0.0  renv_1.0.3        
[49] generics_0.1.3     vroom_1.6.5        rprojroot_2.0.4    hms_1.1.3         
[53] commonmark_1.9.2   munsell_0.5.1      glue_1.8.0         tools_4.4.0       
[57] tokenizers_0.3.0   graphlayouts_1.2.0 grid_4.4.0         gh_1.4.1          
[61] colorspace_2.1-1   repr_1.1.7         ggforce_0.4.2      cli_3.6.4         
[65] rappdirs_0.3.3     textshaping_0.4.0  rsvg_2.6.1         fansi_1.0.6       
[69] viridisLite_0.4.2  svglite_2.1.3      gtable_0.3.6       digest_0.6.37     
[73] tidytuesdayR_1.1.2 gifski_1.32.0-1    htmlwidgets_1.6.4  skimr_2.1.5       
[77] farver_2.1.2       memoise_2.0.1      htmltools_0.5.8.1  lifecycle_1.0.4   
[81] gridtext_0.1.5     bit64_4.5.2        MASS_7.3-60.2     

9. GitHub Repository

Expand for GitHub Repo

The complete code for this analysis is available in tt_2025_46.qmd.

For the full repository, click here.

10. References

Expand for References
  1. Data Source:
    • TidyTuesday 2025 Week 46: The Complete Sherlock Holmes

11. Custom Functions Documentation

📦 Custom Helper Functions

This analysis uses custom functions from my personal module library for efficiency and consistency across projects.

Functions Used:

  • fonts.R: setup_fonts(), get_font_families() - Font management with showtext
  • social_icons.R: create_social_caption() - Generates formatted social media captions
  • image_utils.R: save_plot() - Consistent plot saving with naming conventions
  • base_theme.R: create_base_theme(), extend_weekly_theme(), get_theme_colors() - Custom ggplot2 themes

Why custom functions?
These utilities standardize theming, fonts, and output across all my data visualizations. The core analysis (data tidying and visualization logic) uses only standard tidyverse packages.

Source Code:
View all custom functions → GitHub: R/utils

Back to top
Source Code
---
title: "The Sherlock Holmes Canon Thematic Word Networks"
subtitle: "Each of the 15 most-discussed stories is linked to its five most distinctive dialogue words. Cluster size reflects vocabulary uniqueness. Purple cluster highlights Sign of the Four's distinctive vocabulary."
description: "Network analysis of Sherlock Holmes stories using TF-IDF to identify distinctive dialogue vocabulary. Built with R, tidytext, and ggraph to visualize thematic word relationships across 15 stories from the Holmes canon."
date: "2025-11-16" 
categories: ["TidyTuesday", "Data Visualization", "R Programming", "2025"]
tags: [
  "Network Analysis",
  "Text Mining",
  "TF-IDF",
  "Natural Language Processing",
  "Literary Analysis",
  "Sherlock Holmes",
  "ggraph",
  "tidytext",
  "igraph",
  "tidygraph",
  "Network Visualization",
  "Word Networks",
  "Dialogue Analysis",
  "Arthur Conan Doyle",
  "ggplot2"
]
image: "thumbnails/tt_2025_46.png"
format:
  html:
    toc: true
    toc-depth: 5
    code-link: true
    code-fold: true
    code-tools: true
    code-summary: "Show code"
    self-contained: true
    theme: 
      light: [flatly, assets/styling/custom_styles.scss]
      dark: [darkly, assets/styling/custom_styles_dark.scss]
editor_options: 
  chunk_output_type: inline
execute: 
  freeze: true                                    
  cache: true                                       
  error: false
  message: false
  warning: false
  eval: true
---

![Network graph showing 15 Sherlock Holmes stories (teal nodes) connected to their five most distinctive dialogue words (small grey nodes). The Sign of the Four stands out with a large purple cluster of connected words, indicating it has the most unique vocabulary among the stories. Other stories like Hound of the Baskervilles, Study in Scarlet, and Valley of Fear show smaller, more dispersed word clusters.](tt_2025_46.png){#fig-1}

### <mark> **Steps to Create this Graphic** </mark>

#### 1. Load Packages & Setup

```{r}
#| label: load
#| warning: false
#| message: false      
#| results: "hide"     

## 1. LOAD PACKAGES & SETUP ----
suppressPackageStartupMessages({
if (!require("pacman")) install.packages("pacman")
pacman::p_load(
    tidyverse,     # Easily Install and Load the 'Tidyverse'
    ggtext,        # Improved Text Rendering Support for 'ggplot2'
    showtext,      # Using Fonts More Easily in R Graphs
    janitor,       # Simple Tools for Examining and Cleaning Dirty Data
    scales,        # Scale Functions for Visualization
    tidygraph,     # A Tidy API for Graph Manipulation
    igraph,        # Network Analysis and Visualization
    ggraph,        # An Implementation of Grammar of Graphics for Graphs and Networks
    ggrepel,       # Automatically Position Non-Overlapping Text Labels with 'ggplot2'
    tidytext       # Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools
)
})

### |- figure size ----
camcorder::gg_record(
  dir    = here::here("temp_plots"),
  device = "png",
  width  = 10,
  height = 8,
  units  = "in",
  dpi    = 320
)

# Source utility functions
suppressMessages(source(here::here("R/utils/fonts.R")))
source(here::here("R/utils/social_icons.R"))
source(here::here("R/utils/image_utils.R"))
source(here::here("R/themes/base_theme.R"))
```

#### 2. Read in the Data

```{r}
#| label: read
#| include: true
#| eval: true
#| warning: false

tt <- tidytuesdayR::tt_load(2025, week = 46)

holmes <- tt$holmes |> clean_names()

tidytuesdayR::readme(tt)
rm(tt)
```

#### 3. Examine the Data

```{r}
#| label: examine
#| include: true
#| eval: true
#| results: 'hide'
#| warning: false

glimpse(holmes)
skimr::skim(holmes) |> summary()
```

#### 4. Tidy Data

```{r}
#| label: tidy-fixed
#| warning: false

holmes_clean <- holmes |>
  filter(!is.na(text), nchar(text) > 5) |>
  filter(
    !str_detect(text, "^[A-Z ]+$"),
    !str_detect(text, "^(Table of|CHAPTER|Part [IVX]+|\\d+$)")
  ) |>
  mutate(
    has_quotes = str_detect(text, '"'),
    speaker = case_when(
      str_detect(text, "(?i)said Holmes|Holmes said|Holmes replied|Holmes answered|Holmes asked") ~ "Holmes",
      str_detect(text, "(?i)said I|I said|I replied|I answered|I asked") ~ "Watson",
      str_detect(text, "(?i)he said|she said|he replied|she replied") ~ "Other",
      has_quotes ~ "Unknown",
      TRUE ~ "Narrative"
    ),
    dialogue_text = str_extract_all(text, '"([^"]*)"') |>
      map_chr(~ {
        if (length(.x) > 0) {
          str_remove_all(.x, '"') |>
            str_trim() |>
            str_c(collapse = " ")
        } else {
          NA_character_
        }
      })
  )

# Word counts & TF–IDF
book_words <- holmes_clean |>
  filter(speaker %in% c("Holmes", "Watson")) |>
  filter(!is.na(dialogue_text)) |>
  select(book, dialogue_text) |>
  unnest_tokens(word, dialogue_text) |>
  anti_join(stop_words, by = "word") |>
  filter(nchar(word) >= 4) |>
  count(book, word, sort = TRUE)

book_tfidf <- book_words |>
  bind_tf_idf(word, book, n) |>
  arrange(desc(tf_idf))

top_books <- book_words |>
  count(book, sort = TRUE) |>
  slice_head(n = 15) |>
  pull(book)

distinctive_words <- book_tfidf |>
  filter(book %in% top_books) |>
  group_by(book) |>
  slice_max(tf_idf, n = 5) |>
  ungroup()

edges <- distinctive_words |>
  select(from = word, to = book, weight = tf_idf)

graph <- graph_from_data_frame(edges, directed = FALSE)
V(graph)$degree <- degree(graph)

tbl_graph <- as_tbl_graph(graph)

# Short book labels and node attributes
book_labels <- tibble(book = unique(distinctive_words$book)) |>
  mutate(
    book_label = book |>
      str_remove("^The Adventure of the ") |>
      str_remove("^The Adventure of ") |>
      str_remove("^The ") |>
      str_remove("^A ")
  )

tbl_graph <- tbl_graph |>
  activate(nodes) |>
  left_join(book_labels, by = c("name" = "book")) |>
  mutate(
    node_type = if_else(name %in% top_books, "Book", "Word"),
    size_metric = if_else(node_type == "Book", degree, 1),
    node_type_fct = factor(node_type, levels = c("Book", "Word")),
    display_label = if_else(node_type == "Book", book_label, NA_character_),
    # Highlight Sign of the Four cluster
    is_sign_cluster = name == "The Sign of the Four" |
      str_detect(name, "Sign of the Four")
  )

# Mark edges connected to "Sign of the Four"
tbl_graph <- tbl_graph |>
  activate(edges) |>
  mutate(
    from_node = .N()$name[from],
    to_node = .N()$name[to],
    is_sign_edge = from_node == "The Sign of the Four" |
      to_node == "The Sign of the Four"
  )

# Create layout
set.seed(221)
layout_tbl <- ggraph::create_layout(tbl_graph, layout = "fr")

# Position for "Sign of the Four"
sign_pos <- layout_tbl |>
  as_tibble() |>
  filter(node_type == "Book", str_detect(name, "Sign of the Four")) |>
  slice(1)

annotation_df <- tibble(
  x = sign_pos$x + 0.5,
  y = sign_pos$y + 3,
  xend = sign_pos$x,
  yend = sign_pos$y,
  label = "'Sign of the Four' features<br>the most distinctive vocabulary."
)
```

#### 5. Visualization Parameters

```{r}
#| label: params
#| include: true
#| warning: false

### |-  plot aesthetics ----
colors <- get_theme_colors(
    palette = list(
        book_col = "#0f9a8a",      
        word_col = "#b5beca",      
        highlight_col = "#9b59b6"  
    )
)

### |- titles and caption ----
title_text <- str_glue("The Sherlock Holmes Canon<br>Thematic Word Networks")

subtitle_text <- str_glue(
    "Each of the 15 most-discussed stories is linked to its five most distinctive dialogue words.<br>",
    "Cluster size reflects vocabulary uniqueness. Purple cluster highlights Sign of the Four's distinctive vocabulary."
)

caption_text <- create_social_caption(
    tt_year = 2025,
    tt_week = 46,
    source_text = "{ sherlock R package }"
    )

### |-  fonts ----
setup_fonts()
fonts <- get_font_families()

### |-  plot theme ----
# Start with base theme
base_theme <- create_base_theme(colors)

# Add weekly-specific theme elements
weekly_theme <- extend_weekly_theme(
  base_theme,
  theme(
    # Text styling
    plot.title = element_markdown(
      face = "bold", family = fonts$title, size = rel(1.4),
      color = colors$title, margin = margin(b = 10), hjust = 0
    ),
    plot.subtitle = element_text(
      face = "italic", family = fonts$subtitle, lineheight = 1.2,
      color = colors$subtitle, size = rel(0.9), margin = margin(b = 20), hjust = 0
    ),

    ## Grid
    # panel.grid.major.y = element_blank(),
    # panel.grid.minor = element_blank(),
    # panel.grid.major.x = element_line(color = "gray90", linewidth = 0.3),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.grid = element_blank(),

    # Axes
    axis.title = element_text(size = rel(0.8), color = "gray30"),
    axis.text = element_text(color = "gray30"),
    axis.text.y = element_text(size = rel(0.85)),
    axis.ticks = element_blank(),

    # Facets
    strip.background = element_rect(fill = "gray95", color = NA),
    strip.text = element_text(
      face = "bold",
      color = "gray20",
      size = rel(1),
      margin = margin(t = 8, b = 8)
    ),
    panel.spacing = unit(2, "lines"),

    # Legend elements
    legend.position = "plot",
    legend.title = element_text(
      family = fonts$tsubtitle,
      color = colors$text, size = rel(0.8), face = "bold"
    ),
    legend.text = element_text(
      family = fonts$tsubtitle,
      color = colors$text, size = rel(0.7)
    ),
    legend.margin = margin(t = 15),

    # Plot margin
    plot.margin = margin(20, 20, 20, 20)
  )
)

# Set theme
theme_set(weekly_theme)
```

#### 6. Plot

```{r}
#| label: plot
#| warning: false

### |-  main plot ----
p <- 
  ggraph(layout_tbl) +

  # Geoms
  geom_edge_link(
    aes(
      color = is_sign_edge,
      alpha = is_sign_edge,
      # width = is_sign_edge
    )
  ) +
  scale_edge_color_manual(
    values = c("TRUE" = colors$palette$highlight_col, "FALSE" = "grey70"),
    guide = "none"
  ) +
  scale_edge_alpha_manual(
    values = c("TRUE" = 0.28, "FALSE" = 0.25),
    guide = "none"
  ) +
  scale_edge_width_manual(
    values = c("TRUE" = 0.28, "FALSE" = 0.30),
    guide = "none"
  ) +
  geom_node_point(
    data = ~ filter(., node_type == "Book"),
    aes(size = size_metric * 1.35),
    color = if_else(
      layout_tbl |> filter(node_type == "Book") |> pull(is_sign_cluster),
      colors$palette$highlight_col,
      colors$palette$book_col
    ),
    alpha = 0.25,
    show.legend = FALSE
  ) +
  geom_node_point(
    aes(size = size_metric),
    shape = 21,
    fill = case_when(
      layout_tbl$is_sign_cluster & layout_tbl$node_type == "Book" ~ colors$palette$highlight_col, # Purple Sign book
      layout_tbl$node_type == "Book" ~ colors$palette$book_col,
      layout_tbl$is_sign_cluster & layout_tbl$node_type == "Word" ~ colors$palette$highlight_col, # Purple Sign words
      TRUE ~ colors$palette$word_col
    ),
    color = if_else(
      layout_tbl$is_sign_cluster,
      colors$palette$highlight_col,
      colors$background
    ),
    stroke = if_else(layout_tbl$is_sign_cluster, 0.8, 0.30),
    alpha = 0.70,
    show.legend = FALSE
  ) +
  geom_node_text(
    data = ~ filter(., node_type == "Book"),
    aes(label = display_label),
    repel = TRUE,
    size = 3.0,
    fontface = "bold",
    family = "text",
    color = "grey10",
    box.padding = unit(0.3, "lines"),
    point.padding = unit(0.3, "lines"),
    segment.size = 0.25,
    segment.color = "grey65"
  ) +
  geom_richtext(
    data = annotation_df,
    aes(x = x, y = y, label = label),
    family = "text",
    size = 3.0,
    color = "grey20",
    fill = alpha(colors$background, 0.5),
    label.colour = NA,
    lineheight = 1.05,
    label.padding = unit(0.1, "lines")
  ) +
  # Scales
  scale_size_continuous(range = c(2.4, 10), guide = "none") +
  # Labs
  labs(
    title = title_text,
    subtitle = subtitle_text,
    caption = caption_text
  ) +
  # Theme
  theme(
    plot.title = element_markdown(
      size = rel(1.85),
      family = fonts$title,
      face = "bold",
      color = colors$title,
      lineheight = 1.15,
      margin = margin(t = 8, b = 5)
    ),
    plot.subtitle = element_markdown(
      size = rel(0.85),
      family = fonts$subtitle,
      color = alpha(colors$subtitle, 0.88),
      lineheight = 1.2,
      margin = margin(t = 2, b = 15)
    ),
    plot.caption = element_markdown(
      size = rel(0.55),
      family = "Arial",
      color = colors$caption,
      hjust = 0,
      lineheight = 1.3,
      margin = margin(t = 12, b = 5)
    ),
    axis.text.y = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.grid = element_blank()
  )
```

#### 7. Save

```{r}
#| label: save
#| warning: false

### |-  plot image ----  
save_plot(
  plot = p, 
  type = "tidytuesday", 
  year = 2025, 
  week = 46, 
  width  = 10,
  height = 8,
  )
```

#### 8. Session Info

::: {.callout-tip collapse="true"}
##### Expand for Session Info

```{r, echo = FALSE}
#| eval: true
#| warning: false

sessionInfo()
```
:::

#### 9. GitHub Repository

::: {.callout-tip collapse="true"}
##### Expand for GitHub Repo

The complete code for this analysis is available in [`tt_2025_46.qmd`](https://github.com/poncest/personal-website/blob/master/data_visualizations/TidyTuesday/2025/tt_2025_46.qmd).

For the full repository, [click here](https://github.com/poncest/personal-website/).
:::

#### 10. References

::: {.callout-tip collapse="true"}
##### Expand for References

1.  **Data Source:**
    -   TidyTuesday 2025 Week 46: [The Complete Sherlock Holmes](https://github.com/rfordatascience/tidytuesday/blob/main/data/2025/2025-11-18/readme.md)
:::

#### 11. Custom Functions Documentation

::: {.callout-note collapse="true"}
##### 📦 Custom Helper Functions

This analysis uses custom functions from my personal module library for efficiency and consistency across projects.

**Functions Used:**

-   **`fonts.R`**: `setup_fonts()`, `get_font_families()` - Font management with showtext
-   **`social_icons.R`**: `create_social_caption()` - Generates formatted social media captions
-   **`image_utils.R`**: `save_plot()` - Consistent plot saving with naming conventions
-   **`base_theme.R`**: `create_base_theme()`, `extend_weekly_theme()`, `get_theme_colors()` - Custom ggplot2 themes

**Why custom functions?**\
These utilities standardize theming, fonts, and output across all my data visualizations. The core analysis (data tidying and visualization logic) uses only standard tidyverse packages.

**Source Code:**\
View all custom functions → [GitHub: R/utils](https://github.com/poncest/personal-website/tree/master/R)
:::

© 2024 Steven Ponce

Source Issues